KANDA DATA

  • Home
  • About Us
  • Contact
  • Sitemap
  • Privacy Policy
  • Disclaimer
Menu
  • Home
  • About Us
  • Contact
  • Sitemap
  • Privacy Policy
  • Disclaimer
Home/Assumptions of Linear Regression/How to Test the Normality Assumption in Linear Regression and Interpreting the Output

Blog

12,891 views

How to Test the Normality Assumption in Linear Regression and Interpreting the Output

By Kanda Data / Date May 13.2022
Assumptions of Linear Regression

The normality test is one of the assumption tests in linear regression using the ordinary least square (OLS) method. The normality test is intended to determine whether the residuals are normally distributed or not.

The normality assumption must be fulfilled to obtain the best linear unbiased estimator. Regression models that fulfill the required assumptions have a chance to get the correct hypothesis testing results. 

In the normality assumption test in linear regression, you test the residuals, not the variable data. The assumption required in the OLS linear regression method is that the residuals are normally distributed.

In the normality test, it is recommended that you formulate the hypothesis first. We can create a null hypothesis and an alternative hypothesis.

In hypothesis testing, we use statistical software to test the null hypothesis. The hypothesis for the normality test can be created as follows:

Ho: Residuals are normally distributed

H1: Residuals are not normally distributed

Previously, do you still remember what residual is? Residual is the difference between the actual Y and the predicted Y variables. Next, how to test the hypothesis?

For testing the hypothesis, you can choose the analysis tools that you think are easy to do. I decided to test for normality using the Shapiro Wilk test on this occasion.

Shapiro and Wilk proposed this test in 1965. This normality test is effective for small samples.

In the test criteria, we can see the p-value is compared with the previously set alpha. For example, if we set an alpha of 0.05 (5%), then the criteria for testing the hypothesis are:

P-value > 0.05: Ho is accepted

P-value <= 0.05: Ho is rejected (H1 is accepted)

Normality Test Using Mini Research

To provide a more in-depth understanding, I suggest you can exercise using the data that I will convey. An example of a mini-research used on this occasion is a study that aims to determine the effect of income and population on rice consumption.

In the mini-research, income and population were used as independent variables. Rice consumption is used as the dependent variable. The data we use for exercise can be seen in the table below:

How to test for normality of Shapiro Wilk in STATA

In the first step, you open the STATA application. Furthermore, under the menu options in STATA, you will find several icons. You select the table icon with a pencil drawing (Data Editor).

Next will find the “Data Editor (Edit)” window. In the next step, you input all the data I have conveyed above.

Data from the rice consumption variable (Y) is inputted in the first column, then data from the income (X1) and population (X2) variables are entered in the 2nd column and 3rd column. Next, you create the name and label the variable on the top right, as shown below:

You have input data successfully in STATA up to this stage, and the data is ready to be analyzed. Furthermore, because we conducted the residual normality test, we must first find the residual value.

To find the residual value, you need to perform a regression analysis first. To perform a regression analysis, type in the command in STATA as follows:

regress Y X1 X2

Next, you can press enter, and the results of the linear regression analysis will appear from the variables that we have input. To get the residual value, then you type in the command in STATA as follows:

predict res,r

Next, you can press enter, and the residual value will appear. To check the residual value, you can click the data editor again. The results of the residual value can be seen in the image below:

To test the normality of the residuals using Shapiro Wilk, then you type in the command in STATA as follows:

swilk res

Next, you can press enter, and the normality test results using Shapiro Wilk will appear.

Normality Test Output and Interpreting the Output

The output of the Shapiro Wilk normality test based on the results of the analysis using STATA can be seen in the table below:

Based on the normality test results according to the table above, the prob>z value is 0.68364. Based on this value, the p-value is greater than 0.05, so the null hypothesis is accepted.

Based on the hypothesis that has been created previously, the results of hypothesis testing indicate that the null hypothesis is accepted. Thus it can be concluded that the residuals are normally distributed.

Because the residuals are normally distributed, the regression model created has fulfilled the normality assumption. Next, we need to test other assumptions, such as non-multicollinearity, non-heteroscedasticity, etc. Enhance your understanding of linear regression assumptions with Introductory Econometrics: A Modern Approach (MindTap Course List).

Well, that’s the article on this occasion that kanda data can convey. I hope this article will be beneficial for all of us. See you in the following article!

Tags: assumption test of linear regression using STATA, assumptions of regression analysis, checking the normality test in linear regression, Kanda data, regression model assumptions, Testing assumptions of linear regression, testing normality of linear regression

Related posts

How to Sort Values from Highest to Lowest in Excel

Date Sep 01.2025

How to Perform Descriptive Statistics in Excel in Under 1 Minute

Date Aug 21.2025

How to Tabulate Data Using Pivot Table for Your Research Results

Date Aug 18.2025

Leave a Reply Cancel reply

You must be logged in to post a comment.

Categories

  • Article Publication
  • Assumptions of Linear Regression
  • Comparison Test
  • Correlation Test
  • Data Analysis in R
  • Econometrics
  • Excel Tutorial for Statistics
  • Multiple Linear Regression
  • Nonparametric Statistics
  • Profit Analysis
  • Regression Tutorial using Excel
  • Research Methodology
  • Simple Linear Regression
  • Statistics

Popular Post

September 2025
M T W T F S S
1234567
891011121314
15161718192021
22232425262728
2930  
« Aug    
  • How to Sort Values from Highest to Lowest in Excel
  • How to Perform Descriptive Statistics in Excel in Under 1 Minute
  • How to Tabulate Data Using Pivot Table for Your Research Results
  • Dummy Variables: A Solution for Categorical Variables in OLS Linear Regression
  • The Difference Between Residual and Error in Statistics
Copyright KANDA DATA 2025. All Rights Reserved